Skip to content

fix(cloudflare): use raw binding format for llama-3.2-11b-vision-instruct#54

Merged
stackbilt-admin merged 2 commits intomainfrom
fix/53-llama-vision-raw-binding
Apr 27, 2026
Merged

fix(cloudflare): use raw binding format for llama-3.2-11b-vision-instruct#54
stackbilt-admin merged 2 commits intomainfrom
fix/53-llama-vision-raw-binding

Conversation

@stackbilt-admin
Copy link
Copy Markdown
Member

Summary

  • analyzeImage() with @cf/meta/llama-3.2-11b-vision-instruct was silently returning { content: "", message: "" } when called via the Workers AI binding
  • Root cause: the binding requires { image: number[], prompt, max_tokens } — the chat/image_url format returns choices[0].message.content === null, which extractText() maps to ""
  • Fix: detect this model at dispatch time and route to runLlamaVisionRaw(), which converts base64 → number[], extracts the prompt from system + last user message, and calls ai.run() with the raw format
  • Other vision models (gemma-4-26b-a4b-it, llama-4-scout-17b-16e-instruct) are unaffected — they continue using the chat format

Test plan

  • New test asserts ai.run() is called with { image: number[], prompt, max_tokens } (no messages key) for llama-3.2
  • New test asserts result.content is populated from { response: "..." } — not empty
  • New test covers system prompt concatenation into raw prompt
  • New test covers data: URL input path
  • Existing test updated: gemma-4 still uses chat/image_url format
  • 257/257 tests pass, typecheck clean

Closes #53. Patch release → 1.5.1.

🤖 Generated with Claude Code

stackbilt-admin and others added 2 commits April 27, 2026 06:57
…ruct (#53)

Workers AI binding for this model requires { image: number[], prompt, max_tokens }
instead of the OpenAI-compatible messages/image_url format. The chat path returns
choices[0].message.content === null via the binding, causing extractText() to
silently return "". Other vision models are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Guard against multiple images (silent truncation → explicit error)
- Flatten array-content user messages into raw binding prompt string
- Default max_tokens to 512 when not provided (avoids undefined)
- Expand LLAMA_VISION_RAW_MODELS comment for future maintainers
- Three new tests covering the above

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@stackbilt-admin stackbilt-admin merged commit 68f6daf into main Apr 27, 2026
3 checks passed
@stackbilt-admin stackbilt-admin deleted the fix/53-llama-vision-raw-binding branch April 27, 2026 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cloudflare provider analyzeImage() returns empty string for Workers AI vision models

1 participant